多模式演示为机器人提供了大量信息,以使世界有意义。但是,当从人类示威中学习感觉运动控制政策时,这种丰度可能并不总是会导致良好的表现。无关的数据模式可能导致状态过度规格,在该状态中包含的模式不仅可以在决策中无用,而且可以改变跨环境的数据分布。州过度规格会导致诸如学习的政策之类的问题,而不是在培训数据分布之外推广。在这项工作中,我们提出了掩盖的模仿学习(MIL),以选择性地使用信息方式来解决状态过度指定。具体来说,我们设计了带有二进制掩码的蒙版策略网络,以阻止某些方式。我们开发了一种双层优化算法,该算法可以学习此面具以准确过滤过度指定的模态。我们从经验上证明,使用Robomimic数据集在包括Mujoco和机器人ARM环境在内的模拟域中的基线算法均优于基线算法,并有效地在收集在真实机器人上收集的多模式数据集中有效地恢复了环境不变的模式。我们的项目网站在以下网址介绍了我们的结果的补充详细信息和视频:https://tinyurl.com/masked-il
translated by 谷歌翻译
我们提出了一种新颖的方法来重新定位或放置识别,这是许多机器人技术,自动化和AR应用中要解决的基本问题。我们不依靠通常不稳定的外观信息,而是考虑以局部对象形式给出参考图的情况。我们的本地化框架依赖于3D语义对象检测,然后与地图中的对象关联。可能的配对关联集是基于评估空间兼容性的合并度量的层次聚类而生长的。后者特别使用有关​​相对对象配置的信息,该信息相对于全局转换是不变的。随着相机逐步探索环境并检测更多对象,关联集将进行更新和扩展。我们在几种具有挑战性的情况下测试我们的算法,包括动态场景,大型视图变化以及具有重复实例的场景。我们的实验表明,我们的方法在鲁棒性和准确性方面都优于先前的艺术。
translated by 谷歌翻译
引用视频对象细分任务(RVO)的目的是在所有视频框架中通过语言表达式引用的给定视频中的对象实例。由于需要在各个实例中理解跨模式语义,因此此任务比传统的半监督视频对象细分更具挑战性,在该视频对象分割中,在第一帧中给出了地面真相对象掩盖。随着变压器在对象检测和对象细分方面的巨大成就,RVOS已取得了显着的进步,而Reformen to Reformer实现了最新的性能。在这项工作中,基于强大的基线框架 - 引用者,我们提出了几个技巧来进一步提高,包括周期性学习率,半监督方法和测试时间增加推断。改进的推荐子在CVPR2022上排名第二,参考YouTube-VOS挑战。
translated by 谷歌翻译
视频修复旨在从多个低质量框架中恢复多个高质量的帧。现有的视频修复方法通常属于两种极端情况,即它们并行恢复所有帧,或者以复发方式恢复视频框架,这将导致不同的优点和缺点。通常,前者具有时间信息融合的优势。但是,它遭受了较大的模型尺寸和密集的内存消耗;后者的模型大小相对较小,因为它在跨帧中共享参数。但是,它缺乏远程依赖建模能力和并行性。在本文中,我们试图通过提出经常性视频恢复变压器(即RVRT)来整合两种情况的优势。 RVRT在全球经常性的框架内并行处理本地相邻框架,该框架可以在模型大小,有效性和效率之间实现良好的权衡。具体而言,RVRT将视频分为多个剪辑,并使用先前推断的剪辑功能来估计后续剪辑功能。在每个剪辑中,通过隐式特征聚合共同更新不同的帧功能。在不同的剪辑中,引导的变形注意力是为剪辑对齐对齐的,该剪辑对齐可预测整个推断的夹子中的多个相关位置,并通过注意机制汇总其特征。关于视频超分辨率,DeBlurring和DeNoising的广泛实验表明,所提出的RVRT在具有平衡模型大小,测试内存和运行时的基准数据集上实现了最先进的性能。
translated by 谷歌翻译
视频修复(例如,视频超分辨率)旨在从低品质框架中恢复高质量的帧。与单图像恢复不同,视频修复通常需要从多个相邻但通常未对准视频帧的时间信息。现有的深度方法通常通过利用滑动窗口策略或经常性体系结构来解决此问题,该策略要么受逐帧恢复的限制,要么缺乏远程建模能力。在本文中,我们提出了一个带有平行框架预测和远程时间依赖性建模能力的视频恢复变压器(VRT)。更具体地说,VRT由多个量表组成,每个量表由两种模块组成:时间相互注意(TMSA)和平行翘曲。 TMSA将视频分为小剪辑,将相互关注用于关节运动估计,特征对齐和特征融合,而自我注意力则用于特征提取。为了启用交叉交互,视频序列对其他每一层都发生了变化。此外,通过并行功能翘曲,并行翘曲用于进一步从相邻帧中融合信息。有关五项任务的实验结果,包括视频超分辨率,视频脱张,视频denoising,视频框架插值和时空视频超级分辨率,证明VRT优于大幅度的最先进方法($ \ textbf) {最高2.16db} $)在十四个基准数据集上。
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译
In contrast to the control-theoretic methods, the lack of stability guarantee remains a significant problem for model-free reinforcement learning (RL) methods. Jointly learning a policy and a Lyapunov function has recently become a promising approach to ensuring the whole system with a stability guarantee. However, the classical Lyapunov constraints researchers introduced cannot stabilize the system during the sampling-based optimization. Therefore, we propose the Adaptive Stability Certification (ASC), making the system reach sampling-based stability. Because the ASC condition can search for the optimal policy heuristically, we design the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm based on the ASC condition. Meanwhile, our algorithm avoids the optimization problem that a variety of constraints are coupled into the objective in current approaches. When evaluated on ten robotic tasks, our method achieves lower accumulated cost and fewer stability constraint violations than previous studies.
translated by 谷歌翻译
The surrogate loss of variational autoencoders (VAEs) poses various challenges to their training, inducing the imbalance between task fitting and representation inference. To avert this, the existing strategies for VAEs focus on adjusting the tradeoff by introducing hyperparameters, deriving a tighter bound under some mild assumptions, or decomposing the loss components per certain neural settings. VAEs still suffer from uncertain tradeoff learning.We propose a novel evolutionary variational autoencoder (eVAE) building on the variational information bottleneck (VIB) theory and integrative evolutionary neural learning. eVAE integrates a variational genetic algorithm into VAE with variational evolutionary operators including variational mutation, crossover, and evolution. Its inner-outer-joint training mechanism synergistically and dynamically generates and updates the uncertain tradeoff learning in the evidence lower bound (ELBO) without additional constraints. Apart from learning a lossy compression and representation of data under the VIB assumption, eVAE presents an evolutionary paradigm to tune critical factors of VAEs and deep neural networks and addresses the premature convergence and random search problem by integrating evolutionary optimization into deep learning. Experiments show that eVAE addresses the KL-vanishing problem for text generation with low reconstruction loss, generates all disentangled factors with sharp images, and improves the image generation quality,respectively. eVAE achieves better reconstruction loss, disentanglement, and generation-inference balance than its competitors.
translated by 谷歌翻译